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Abstract 

Background: DNA is a carrier of biological information. The hybridization process, the formation of the DNA 
double-helix from single-strands with complementary sequences, is important for all living cells. DNA microarrays, 
among other biotechnologies such as PCR, rely on DNA hybridization. However, to date the thermodynamics of 
hybridization is only partly understood. Here we address, experimentally and theoretically, the hybridization of 
oligonucleotide strands of unequal lengths, which form a bulged loop upon hybridization. For our study we use 
in-house synthesized DNA microarrays. 

Results: We synthesize a microarray with additional thymine bases in the probe sequence motifs so that bulged 
loops occur upon target hybridization. We observe a monotonic decrease of the fluorescence signal of the 
hybridized strands with increasing length of the bulged loop. This corresponds to a decrease in duplex binding 
affinity within the considered loop lengths of one to thirteen bases. By varying the position of the bulged loop 
along the DNA duplex, we observe a symmetric signal variation with respect to the center of the strand. We 
reproduce the experimental results well using a molecular zipper model at thermal equilibrium. However, binding 
states between both strands, which emerge through duplex opening at the position of the bulged loop, need to 
be taken into account. 

Conclusions: We show that stable DNA duplexes with a bulged loop can form from short strands of unequal 
length and they contribute substantially to the fluorescence intensity from the hybridized strands on a microarray. 
In order to reproduce the result with the help of equilibrium thermodynamics, it is essential (and to a good 
approximation sufficient) to consider duplex opening not only at the ends but also at the position of the bulged 
loop. Although the thermodynamic parameters used in this study are taken from hybridization experiments in 
solution, these parameters fit our DNA microarray data well. 



Background 

The hybridization process - the formation of the well- 
known double-helix structure from two complementary 
nucleic acid strands (such that A ♦ T and C ♦ G base 
pairs are formed) - is pivotal to the living organism. 
Among other important biotechnological methods, PCR 
or DNA microarray technology rely on it. 

DNA microarrays consist of regular spaced domains of 
surface-attached probe sequences, which act as binding 
sites for their complementary fluorescently-labeled tar- 
get sequences in solution. The probe sequence and posi- 
tion of each domain on the surface is known and the 
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amount of bound target DNA can be determined quan- 
titatively. Microarrays are important in many biotechno- 
logical methods such as gene expression profiling, where 
complex target oligonucleotides mixtures need to be 
analyzed in a highly parallel manner [1-3]. 

Due to the very sensitive molecular recognition pro- 
cess of DNA, one is in principle able to detect even 
small sequence deviations with the help of DNA micro- 
arrays. However, DNA targets that are not perfectly 
complementary can also form duplexes with the surface 
bound probes, albeit are less stable than the perfectly 
matching correspondent (PM). 

Although DNA microarrays are widely used in biologi- 
cal and biotechnological applications, the underlying 
physical mechanisms of the hybridization process are 
poorly understood. Data analysis is mostly based on 
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empirical, statistical methods [4-6]. To fully exploit the 
potential of the DNA microarray technology, it is desir- 
able to pursue a more fundamental approach to the sta- 
bility of hybridized or partly hybridized strands. 
Molecular simulations have greatly increased our under- 
standing of DNA dynamics, thermal fluctuations and 
hybridization. DNA hybridization and mechanical prop- 
erties of DNA e.g. the persistence length in the presence 
of surfaces were investigated on the molecular level 
[7-9]. While molecular simulations give a very detailed 
view of the molecular dynamics, here we are interested 
in a simple scheme to assess the stability of bulged 
loops on a DNA microarray. Systematic experiments on 
short bulged loops have hardly been performed. 

The standard model for hybridization in solution is 
the so called two-state-nearest-neighbor model (NN- 
Model), which treats the formation of the DNA duplex 
as a two-state process where the duplex is either fully 
hybridized or fully denatured [10,11]. The model calcu- 
lates the binding free energy of a perfectly complemen- 
tary double-stranded duplex by summing the nearest- 
neighbor interaction parameters (10 experimentally 
determined free energy parameters [12-14]). These para- 
meters take into account, that DNA stability arises from 
hydrogen bonding and base stacking interactions. 
Furthermore it is possible to extend the model and 
include single base mismatch (MM) defect parameters 
[13]. This model proved very successful for the predic- 
tion of duplex melting temperature T m in solution. 

In several experiments, incorporated MMs have had a 
position dependent influence on the fluorescent signal. 
Zhang et al. suggested the position-dependent-nearest- 
neighbor-model (PDNN) [15] where the binding free 
energy of the duplex is calculated as a weighted sum of 
the nearest-neighbor parameters. The weight parameters 
are determined empirically. 

In the past we experimentally and theoretically investi- 
gated the effect of single MMs on the duplex stability of 
a DNA microarray in the case where the lengths of 
probe and target match [16-18]. We have shown that a 
two state NN-model could not predict the MM binding 
affinities precisely. Therefore we developed a different 
theoretical approach, based on a double-ended molecu- 
lar zipper [19-21]. The double-ended molecular zipper 
considers, that the duplex can only open from the ends. 
This simplification is justified because base pairs, which 
are located away from the duplex ends are less stable. 
This holds even if a single MM is incorporated into the 
duplex. Taking into account the heterogeneity of the 
binding affinities due to synthesis defects, the DNA 
microarray data could be reproduced with the model. 
We have shown that the double-ended zipper model 
maps to the PDNN model, while the former is derived 
from first principles [16]. The purpose of this study is to 



investigate the case where probe and target have 
unequal lengths and bulged loops form upon hybridiza- 
tion. Bulged loops are referred to as loops in the follow- 
ing. With our DNA microarray setup, loops of different 
lengths and at different positions can be obtained in a 
controlled manner by inserting additional bases into the 
perfectly matching probe sequence. The formation of 
loops increases the complexity of the hybridized state: 
new binding states between probe and target strands 
may emerge. We show that a good reproduction of the 
experimental data remains possible with the molecular 
zipper, but only if duplex opening can also occur at the 
loop position. 

Methods 

DNA Microarray Hybridization Experiments 

We use in-house synthesized DNA Microarrays. All 
employed protocols including the preparation of dendri- 
mer-functionalized microarray substrates, the light- 
directed synthesis (a "maskless" photolithographic tech- 
nique based on NPPOC-phosphoramidites), as well as 
the data analysis methods are provided in Naiser et al. 
[18]. The only difference to the previously published 
experimental setup is a more homogeneous illumination 
of the microarray surface as well as an increased resolu- 
tion due to the improved optics. 

To avoid target-target interaction and competitive 
hybridization effects, only one target species (see table 
1) is employed in the hybridization experiments. Probes 
on the microarray surface are coupled to the surface 
with their 3'-end. Hybridization temperature is 317 K. 

Images of the hybridized DNA microarray are taken 
for data analysis after thermal equilibrium is reached. 

In order to test if the microarray surface has a signifi- 
cant influence on the hybridization, we repeat our 
experiments with the reversed probe sequence. The 5'- 
end of the sequence employed throughout this work 
corresponds to the 3'-end of the test sequence. No influ- 
ence of the microarray surface on the hybridization 
could be detected (see additional file 1: Influence of the 
microarray surface on the hybridization signal). 

Probe Design 

To generate single-stranded DNA loops in the probe- 
target-duplexes, we introduce additional poly-T- 
sequences into the PM sequence. This is illustrated in 
Figure 1. The poly-T-sequence (black) is located 
between the red and the green parts of the probe strand. 

Table 1 Target sequence Cy3-labeled perfect matching 



target sequence in solution. 



Target sequence 


length(bases) 


5'-aag™tgatgagta™atggtototaatg-3' 


33 
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hybridization 




Figure 1 Formation of a complementary strand by bulging of the non-complementary sequence. The green and the red duplex parts of 
the two strands are of complementary sequence. Complete hybridization can occur only if the black portion forms a bulged loop. 



The green and red parts of the probe strand are comple- 
mentary to the corresponding parts in the target strand. 
Upon hybridization, the black part forms a loop. By 
varying the length of the poly-T-sequence and the posi- 
tion at which the poly-T-sequence is introduced into 
the probe motif, generation of loops of different lengths 
and at different positions can be achieved. 

In this way, poly-T-loops up to a length of 13 bases, at 
20 different positions along the strand, amounting to 
260 different probe sequences are generated during the 
in situ synthesis. To control the synthesis quality, 20 
PM features are added. A single "feature block" consists 
of these 280 features organized as a square (see Figure 
2). This feature block is synthesized 4 times on the 
microarray. Table 2 lists the synthesized probe 
sequences. 

We also synthesized probes with other loop sequences 
than the discussed poly-T. We investigated the influence 
of poly-C-sequences and random sequences on duplex 
stability as a function of loop length. The results are 




Figure 2 Hybridization signals scaled with respect to PM grid 

The intensity of each feature inside the feature block can be 
normalized taking into account the average signal of four 
corresponding PM features (green arrows). With this data, 
illumination gradients are corrected linearly across the feature block. 



provided in additional file 2: Duplex stability of DNA 
duplexes with bulged loops of different sequences as a 
function of loop length. We didn't observe a significant 
change in the dependence of the fluorescent signal as a 
function of loop length as compared to the poly-T- 
sequences. 

Data Acquisition 

In order to determine the fluorescence intensities 
("hybridization signals") of the microarray features from 
hybridized, fluorescently labeled target molecules, we 
take images of the DNA microarray surface with a fluor- 
escent microscope. In Figure 2, we show such an image. 
A feature block (see Probe Design for definition) is sur- 
rounded by PM features. These PM features help con- 
trol the illumination quality during synthesis and 
microscopic observation. For each feature inside the fea- 
ture block, there are four corresponding PM features 
(green arrows). The average signal of these four PM fea- 
tures is used to correct the signal of the feature by nor- 
malizing the latter with respect to the average signal of 
the PM features. Synthesis-related illumination gradients 
can be - at least linearly - canceled out. To reduce 
experimental error, we reproduce the same feature 
block on the microarray at four different locations. To 
obtain the final data set, we take the average of the nor- 
malized signals of these four feature blocks. Error can 
be due to inhomogeneities of the microarray surface, 
fluorescent stains in the feature blocks or illumination 
gradients during the synthesis. The hybridization signals 
as a function of loop length and loop position of the 
final data set are shown in Figure 3. Hybridization tem- 
perature is 317 K. Strongest and weakest hybridization 
signals are normalized to 1 and to 0 respectively. For 
further details [18,22]. 

Results and Discussion 

Binding Affinities as a Function of Loop Length 

Figure 4a shows the dependence of the hybridization 
signal as a function of loop length averaged over all 
loop positions. The intensity of the PM is set to 1. We 
note a monotonic decrease of the signal with increasing 
loop length. The insertion of a single base already 
reduces the hybridization intensity to about 85% of the 
PM signal, 13 additional bases (largest number of 
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Table 2 Probe sequences 



loop position 


loop length 


Probe sequence 


length (bases) 


PM Probe 




3'-toaatactactcataa™ccaacaaca™c-5' 


33 


7 


1 


3'-toaatatctactcataa™ccaacaaca™c-5' 


34 



7 2 3'-TCAATATTCTACTCATAA™CCAACAACA™C-5' 35 



3TOAATATT..CTACTCATAATOCCAACAACATOC-5' 


7 


13 


3'-toaatatttttttttttttctactcataa™ccaacaaca™c-5' 


46 


8 


1 


3'-toaatacttactcataa™ccaacaaca™c-5' 


34 



8 2 3'-TOAATACT1TACTCATAA™CCAACAACA™C-5' 35 



26 



13 



3'-HCAATACTACTCATAATTACCAACA" 



■TTTACATOC-5' 



46 



Synthesized probe sequences (features) on the DNA microarray surface. 

Bold letters highlight the additional thymine bases. Altogether, there are 261 different features. 



additional bases under study) reduce the signal to about 
60% of the PM signal With a zipper, MMs in the mid- 
dle of the duplex affect duplex stability most, because 
they are included in many of the possible states that are 
considered in the partition function. The employed 
probe strands are short compared to the length of DNA 
sequences used in other applications, which explains 
why the decrease in signal intensity after inserting a sin- 
gle additional base seems unusually strong. 




loop length 13 7 loop position after base 

Figure 3 Fluorescent signals as function of loop length and 
loop position. Loop lengths L vary from 0 (PM) to thirteen 
additional thymine bases, loop positions P vary from 7 to 26. Loop 
position P means that the loop forming additional bases are 
inserted after base number P of the probe motif (counted from the 
surface). Strongest signal is set to 1, weakest signal is set to 0. 
Hybridization temperature is 317 K. 



Binding Affinities as a Function of Loop Position 

Figure 4b shows the measured hybridization signals as a 
function of loop position in 3' to 5' direction after aver- 
aging over all loop lengths (PM signal set to 1). The 
resulting "loop position defect profile" is symmetric with 
respect to the center of the duplex. The signal is stron- 
gest for loops at the end, as well as in the middle of the 
duplex, it is weakest for loops at a distance of about 3-4 
bases from the center. The difference between maxi- 
mum and minimum is about 10% only. This is a weak 
variation compared to the hybridization signal as a func- 
tion of loop length. 



3 
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Figure 4 Experimentally determined fluorescent signals. 

Symbols: feature block 1, blue upward-pointing triangles; feature 
block 2, cyan circles; feature block 3, green downward-pointing 
triangles; feature block 4, magenta squares; average of all feature 
blocks, solid black line, a) Fluorescent signals as a function of loop 
length (from loop length 0 (PM), to loop length 13) averaged over 
all loop positions. PM signal is set to 1. b) Hybridization signals as a 
function of loop position averaged over all loop lengths. Loop 
position 7 indicates that the loop is inserted after base number 7 of 
the probe motif counted from the surface. 
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From the following arguments, the dependence of the 
fluorescent signal on loop position, can be understood 
at least qualitatively (see Figure 5): 

Loops positioned close to either end of a duplex have 
less potential binding sites towards that end, and they 
can open to form a dangling end. However, in this case 
a large part of the duplex to the opposite side of the 
loop remains strongly bound (Figure 5a). 

Loops located at a center position have many possible 
binding partners to the left and to the right resulting in 
a closed loop and higher duplex stability (Figure 5b). 

In between both of the extremes above, the hybridiza- 
tion signal drops to a minimum. This is because on one 
side these loops have less binding partners than loops in 
the middle of the duplex. On the other, the large hybri- 
dized part is shorter than for loops occupying end posi- 
tions (Figure 5c). 

Thermodynamics of DNA Hybridization 

At equilibrium single stranded probes P and target 
molecules T form a duplex D with a rate constant k+, 
they denature with a rate constant /<_: 



a) end position 

large hybridized region 



. — dangling ends 



b) center position 



\ / 

high probability of duplex closing 
c) position between end and center 




less binding partners 
than in case a) 



\ 



low probability of duplex closing, 
resulting in large dangling ends 



Figure 5 Signal dependence on loop position, a) A loop at a 
duplex end may result in dangling ends owing to the low 
probability of duplex closing. The large green region may provide 
high duplex stability and a strong fluorescent signal, b) If the loop 
is located towards the center of the duplex, it is likely that the 
duplex is closed at both sides of the loop due to the large number 
of possible binding partners on both sides of the loop, c) At a 
position located between end and center of the strand, the duplex 
is less stable than in a) and b). 



P + T^D 



(1) 



This process can be described with a Langmuir-type 
adsorption isotherm. Since targets were in excess in our 
experiments, the target concentration [7] = [T 0 ] is con- 
sidered constant. The fraction of hybridized probes 9: 



JD]_ K- [Tp] 
[P 0 ] [T 0 ] 



(2) 



where K is the equilibrium binding constant of the 
probe-target duplex. Since the fluorescent signal of the 
array is proportional to the fraction of hybridized probes 
we think of 9 as the "hybridization signal" in the 
following. 

The Langmuir-type adsorption isotherm (2) has a very 
narrow transition region from low to high binding affi- 
nity. Our experimental data from previous experiments 
exhibits a broadened transition region. As we have 
shown [16,17], this is due to the heterogeneity of bind- 
ing affinities due to unavoidable sequence defects during 
the in situ synthesis. It is necessary to describe the 
situation with a distribution of binding constants K t . 
Thus, the hybridization signal of an individual probe 
with random defects reads: 



Kj- [Tp] 
1 + Ki- [T 0 ] 



(3) 



Assuming that the synthesis defects follow a binom- 
inal distribution with a probability p that a defect 
occurs, the hybridization signal 9 of a single feature is: 



E 



Ki- [T 0 



1 l +Kj- [Tp] (4) 

N' 
h' 



N ' is the number of bases in the probe, k' the number 
of synthesis defects and x k - is the probability that k' 
synthesis defects occur in a probe of length N '. 



X V 



N'-k 1 



(5) 



To minimize computation time, synthesis defects are 
only considered up to a certain maximum number per 
strand k' max . The bases of the loop are treated synthesis 
defect free. Since the bases in the loop are, most of the 
time, only weakly or not at all bound (there are almost 
no complementary bases in the target strand), the 
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consideration of synthesis defects in the loop is not 
necessary. 

In our case, N ' = 33, we took up to k' max = 3 synthesis 
defects into account. This generates 6018 different 
probe sequences. Strands with more than 3 synthesis 
defects can be neglected (see additional file 3: Influence 
of the number of MMs on the fluorescent signal). 

In the following, we calculate the binding constants K t 
as a function of loop position P and loop length L in 
thermodynamic equilibrium. 

Partition Function of the Double-ended Zipper 

The partition function of the double-ended zipper 
model is [19-21]: 



N-l N 



N-l N 



2 D = E E «w = E E eAGyRT 



(6) 



k=0 l=k+l 



k=0 l=k+l 



Here, N is the number of NN pairs and co k j is the sta- 
tistical weight of the partially denatured state S^j, k and 
/ are the positions of the left and right zipper fork 
respectively. AGj^ is the sum of NN free energies 
Ag° (Ag° > 0) of the zipped duplex sections. 



AG' 



k,l 



i=k 



+ Aft, 



(7) 



Aginit = -4.5 kcal/mol is the duplex initialization free 
energy [17]. For the binding constant K t (P, L), K t (P, L) 
= Z D (P, L) taking the totally denatured state, S 0 , as the 
reference state. 

Figure 6 illustrates the double-ended zipper model and 
the corresponding notation. The duplex is hybridized 
between the zipper forks at positions k and /. This 



a) 



b) 




Figure 6 Double-ended zipper model of DNA hybridization, a) 

Duplex can only open and close from the ends in a zipper-like 
fashion. Between the two zipper fork positions k and /, the duplex is 
closed. AG£j is the sum over all closed NN parameters in the 
bound duplex section, b) One MM affects two NN pairs. 



corresponds to the free energy AGj^. Duplex opening 
and closing occurs only at the ends indicated by the 
black arrows left and right to the duplex. In Figure 6b, a 
single MM is incorporated into the duplex. 
Loop energy penalties 

We have shown that it is sufficient to include MM 
defect parameters into a zipper model to account for 
single base defects [17]. In the following, we test this 
simple model for the case of loops. For single stranded 
DNA loops we calculate purely entropic energy penalties 
by treating the DNA loop as a self-avoiding random 
walk (SAW) on a lattice. Since duplex opening can only 
occur from the ends and therefore the DNA loops are 
always closed, only SAWs which return to the origin 
need to be considered. For the number of SAWs of 
length / returning to the origin in the limit / — > °o 
[23,24]: 



origin °£ or * y. 



(8) 



O" = 1, 75 ♦ 10" 4 is the so-called cooperativity para- 
meter, [A is the connectivity constant and c = 2, 15 is the 
loop closure exponent, a and c are universal constants 
whereas ft (ft = 4, 684 used here) depends on the con- 
sidered geometry. 

For the total number of SAWs of length / of all possi- 
ble SAW configurations [23,25]: 



ft 



total 



oc ix 1 ■ V 



(9) 



7=1, 157 ± 3.10" 3 is the (universal) entropic expo- 
nent. That gives us the probability p that a SAW of 
length / returns to the origin: 



p(0 



T origin 



(10) 



#, 



total 



11' 



Given p(l), we can calculate the entropy S(l) and the 
corresponding loop energy penalties AG entropy (l): 



S(0 = N A • k B • ln\p[l)] 

AGg ntro py(/) = —T'S(l) 



(id 



The length of a DNA loop is determined by the num- 
ber of bases L in the loop and the distance a 0 between 
two adjacent bases. The length of a random walk is the 
number of steps from start to end on a lattice with the 
lattice parameter p 0 . When treating a DNA loop as a 
SAW, one has to consider the persistence length of sin- 
gle stranded DNA, which determines the number of 
steps in the SAW and defines the lattice parameter p 0 . 
Since p 0 and a 0 rank in the same dimension depending 
on the salt concentration [26-28], we take a DNA loop 
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of length L ♦ a 0 as a SAW with L steps on a lattice with 
the lattice parameter p 0 « <2 0 (salt concentration is 0.90 
M NaCl and 50 mM NaH 2 P0 4 ). Moreover, we test the 
influence of p 0 on the absolute loop energy penalties 
values AG entropr This shows that the differences are 
negligible. This means: 

AG entropy {L) = -T • S(L) (12) 

where L ranges from 1 to 13. 

Figure 7 shows the comparison between our experi- 
mental signals and predictions with the simple zipper 
model, a) Hybridization signals as a function of loop 
lengths averaged over all loop positions, b) Hybridiza- 
tion signals as a function of loop position averaged over 
all loop lengths. The symbols indicate the signals of 
each feature block, the solid black line is the average 
signal of all feature blocks and the red solid line repre- 
sents the theoretically predicted signals. The experimen- 
tal results cannot be reproduced based on the simple 
zipper. 

Extension of Double-ended Zipper Model 

So far duplex opening was only possible from the ends 
of the duplex. States, in which the duplex zips at the 
loop position, are essential for the correct reproduction 
of our experimental results. 

The partition function of a duplex Z on (P, L) as a 
function of loop position P and loop length L can be 
decomposed as a sum of five elements: 




C/D ' |__i i i i i i i l_ 

PM 1 3 5 7 9 11 13 

loop length 



1h- 




'55 0.2L , , , , , , , , l_ 

7 9 11 13 15 17 19 21 23 25 

loop position after base 



Figure 7 Experimental data and theoretical signals calculated 
with a double-ended zipper (no opening at loop position) 

Symbols: feature block 1, blue upward-pointing triangles; feature 
block 2, cyan circles; feature block 3, green downward-pointing 
triangles; feature block 4, magenta squares; average of all feature 
blocks, solid black line; calculated signals, solid red line, a) 
dependence of experimentally observed fluorescent signals on loop 
length after averaging over all loop positions compared to the 
calculated fluorescent signals averaged over all loop positions, b) 
dependence of experimentally observed fluorescent signals on loop 
position after averaging over all loop lengths compared to the 
calculated fluorescent signals averaged over all loop lengths. 

v / 



Zzipper (P> L): The hybridized strands zip from both 
ends. Partition function as presented in the section 
above. 

Zextended,H g ht (P, L): Probe and target strands right of the 
loop position can undergo every possible binding config- 
uration among each other (not limited to a zipper). Thus, 
loops of different size in probe and target strand can 
appear. The duplex part left of the loop position zips to 
and from the loop position. The free energy of this part 
is considered in AG left . Figure 8 illustrates Z extended>right . 
The red and green dashed lines represent hybridized 
duplex parts. The black dashed lines are denatured (at 
the end) or they form a loop between probe and target 
(middle). Here in this particular case, the probe and tar- 
get strand form a loop of 16 and 11 bases respectively. 
The two strands reunite after base 28 of the probe strand 
and base 23 of the target strand, the following 7 bases are 
hybridized. This results in the free energy AG 7>2 8,23- 

Zextendedjeft (P, L)l analogOUS tO 
Zextended,right (P> Q but Opposite side. 

Z double zipper (P, L): Both parts, left and right from the 
loop position behave like an independent zipper. To 
avoid double count of states from adding Z extended>right 
(P, L) and Z extended> i eft (P, L), this partition function 
needs to be subtracted. 

Z non-canonical (P, L): This partition function sums over 
all non-canonical binding states which occur simulta- 
neously on both sides of the loop position. As we show 
below, this term can in principle be neglected because 
all of these binding states bind only weakly. 

In the full expression for Z extended>right the summation 
of all possible binding configurations between both 
strands right of the loop position depends on the zip- 
ping state S k)i of the duplex part left of the loop posi- 
tion. This makes the calculation computation intensive. 
To reduce computing time of the model, we use Z ex _ 
tended,ri g ht approximated by (Figure 8): 

N-P+l N+L-n+1 N-n+1 

Zextended,right{P/ L) = ^ ] ^ ] ^ ] ^n,i,j 
n=2 i=P j=P 

with co nA/j = g A ^ +AG y^ +AG «* (13) 

n-\ 

and AG n/itj = ^ Ag} 

r=l 

i and ; mark the positions of the zipper forks in probe 
and target respectively, n bases of probe and target 
strand (n - 1 NN pairs) of the region to the right of i 
and j are hybridized. Thus, AG n>i) j is the NN energy of 
(n - 1) base pairs which are hybridized from zipper fork 
position i in the probe and zipper fork position ; in the 
target. Ag} is the NN energy of a single hybridized base 
pair in this region. 
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Figure 8 Notation of. Z exten ded,right Here, each line represents one base, a loop of length L = 9 (nine additional thymine bases) is inserted after 
base number 12 of the PM probe motif (loop position P = 12). The red lines represent hybridized duplex parts (Watson-Crick base pairing) and 
black lines denatured parts (the black lines in the middle of the probe include the additional loop bases). To the right of the loop position P, 
probe and target strand can bind in every possible conformation to each other resulting in Watson -Crick and non-Watson-Crick base pairing 
(green lines). The free energy of this part of the duplex is accounted for by AG 7)2 8,23- In this way, bulged loops form in probe and target strand 
of different lengths. Here, a loop of length = 16 bases and M 2 = 1 1 bases is formed in probe and target respectively. Both loops reunite after 
base number /' = 28 of the probe and base number j = 23 of the target strand. The black, rightmost part of the duplex is denatured and not 
considered in the computation. To reduce computing time, the duplex part left of the loop position is considered hybridized all the time. The 
free energy AG /efr of this part is calculated with the help of the zipper model (6). 



The free energy of the duplex part left of the loop 
position AGief t is approximated using the zipper model 
(6). Since AG te f t is a function of the loop position P 
only, and it is independent from the current zipping 
state S k> i, computing time is greatly reduced. 



AG kft = AG left {P) = RT-ln 
with cok,i defined as before 



"P-2 P-l 
.fe=0 l=k+l 



(14) 



We calculate the binding constants 

^QC-P/ ^) = ^zipper + Z 'extended jight 

+ ^ extended ,left ^double zipper 



(15) 



with and without our approximation (13) for Z extended) 
rig ht{P> L) (and (20) for Z extended}left (J>, L) below) for one 
duplex sequence. To fit the theoretical signals to the 
experimental data we use a scaling factor C, which links 
the calculated binding constants to the fluorescent 
intensity values (C is a free parameter): 



1 + C-Ki-[T 0 ] 



(16) 



Figure 9 shows that our approximation for Z extended> 
right (13) and for Z e ^ ew ^ /r ^(P, L) (20) is excellent, if C 
is adjusted properly. If the factor C is the same for the 
calculation with and without approximation, the red and 
black curve differ in absolute values but the shape 



remains very similar (left side in Figure 9). By adjusting 
C, the two curves overlap (right side). The reason for 
this is that the approximations (13) and (20) neglect 
some binding states resulting in (slightly) smaller overall 
binding constants. 

In the extended model, it is possible that loops start at 
some origin q and end at position f . Now we obtain for 




7 9 11 13 15 17 19 21 23 25 

loop position after base 

Figure 9 Comparison between the signals calculated with and 
without approximation for Ze X tended,right and Z extended ,| eft . 

Symbols: signal calculation without approximation, black solid line; 
signal calculation wit approximation red solid line. We compare the 
calculated fluorescent signals resulting from K-, for a duplex 
sequence without synthesis defects to test our approximation for 
Zexendedjight Z extendedJeft . a) If the same scaling factor C = Q = C 2 
is used, the two curves differ in absolute values, but the shape is 
very similar, b) By choosing two different scaling factors C h C 2l the 
two curves overlap very well. This shows that our approximation for 

Zextendedjight an d Z extended j eft \S excellent. 
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two SAWs with Mi and M x steps: 



For Z douUe zippe XP, T), we have: 



AG entropy, right ~ A G entropy, right 1 / A// 2 ) 

= -k B T-ln [p{MiMi)\ 
with A^i = i — P 
and A4 2 = j — P 



(17) 



p(Mi, M 2 ) is the probability that two SAWs with 
number of steps Mi and M 2 respectively start at the ori- 
gin and meet again. Here, we have: 



#(M lf r)-#(M 2 ,r') 

#total(Mi) '#total{M 2 ) 



#(Mi,r) -#(A4 2/ r) 

'-T 1 #total{Mi) • #total{M 2 ) 

with \r\ < min{Mi,M 2 ) 



(18) 



#(Mi, r) is the number of SAWs with M t steps which 
start at the origin and end at position ?. In 3D [23]: 



#{M if r) ocfi Mi -M\ 



A y-l-3v 



with g(x) oc • e~ Xx \ X > 0,8 
and cp = 



1 - v 



(19) 



Constants y and [i are defined as before, v = 0, 588 ± 
1, 5 ♦ 10" 3 is the (universal) metric exponent. # totaled is 
the total number of SAWs of M t steps (as defined in 
equation (9)). 

In an analogous manner, Z extended j e f t is calculated: 

P L+P-nP-n 

Z extended, left(P i L) = ^>n,i,) 
n=2 i=0 j=0 

AG n/1 j + AG en tropy,left + AG r ight 

with co n ,i,j = e 



RT 



(20) 



and AG„,y = ^ Ag? 



r=l 



Here we have 



AG r i g ht = AG r i gnt (P) 

fN-1 N 

= #T • Jn 



and finally 

AG en tropy ,left{^\ Ml) = ~k B T • In [p(Mi, A4 2 )] 

with A4i = L + P — n — i 

and A4 2 = P — n — j 



(21) 



(22) 



Z double zipper {P r T) - ^kXo.p 
k,l o,p 



AG kl + entropy, double + ^^o,p 



(23) 



with &>Uo,p = e 



AT 



And AG entropy, double' 

AG en tropy, doubled Ml) = ~k B T • /n[p(A*i, A4 2 )] 

with A4i = o — I + L — 1 
and A4 2 = o — I— 1 



(24) 



In the case where probe and target length match, 
duplex zipping can only occur if the two strands are 
perfectly aligned. We consider the initiation energy, the 
entropic barrier to meet this constraint, as constant. We 
simply write K t = Z D and include the initiation energy 
in a prefactor. 

In the case of duplexes with loops, the probe-target 
length difference AL increases the possible conforma- 
tions of the probe strand, that do not promote duplex 
initiation. The initiation energy changes accordingly. 
Neglecting unfolding of the coils for duplex formation, 
the number of pairing collisions, that do no lead to zip- 
ping, grows linearly with AT, resulting in an initiation 
entropy change 



AS imt oc In ( 1 + 



(25) 



T 0 is the characteristic length of the problem, which is 
the persistence length (in our experimental conditions 
this corresponds to a single base). In the case of a 
short, loop-forming sequence located in the center of 
the strand, however, there are two positions, where par- 
allel but shifted probe and target strands can initiate 
duplex formation. These positions correspond to the 
matching sequence left and the right from the loop 
implying a correction of AL/L 0 by 1/2. However, if the 
loop forms towards the ends, we are close to the situa- 
tion of a single strand above. In the following we 
neglect this dependence on loop position and use a fac- 
tor 1/2 throughout. Either factor (1 or 1/2) does not 
drastically modify our result, if the factor C is adjusted 
accordingly. 

Our approximation for AS init tends to overestimate the 
corresponding initiation energy penalty as AT increases. 
This is because for large AT the situation differs: in this 
case the separated matching sequences are almost inde- 
pendent and the initiation energy tends to its asymptotic 
value of two independent hybridization events. As a 
conclusion for large AT a weaker dependence of the 
initiation energy on AT can be expected. 
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From (25), we get the modified binding constant Kf. 
, Z D (P,L) 

1 + 2L 0 

The calculation of the hybridization signal is then 
straight forward. 

We note, that the choice of the denominator of equa- 
tion (26) following from (25) has an impact on the cal- 
culated hybridization signals. Our theory could possibly 
be improved by choosing a different denominator 
which, however, may be a subtle problem by itself, not 
the scope of this paper. 

Figure 10 shows the comparison between our experi- 
mental results and theory, a) Hybridization signals as a 
function of loop length for one specific loop position, b) 
Hybridization signals as a function of loop position for 
one specific loop length. In the figures to the right, we 
give the 95% confidence intervals for our data points 
(black) and compare them to our theory (red). This 
shows that the experimentally observed trends and the 
reproduction with our model are statistically relevant. 
The different symbols indicate the signals of the 



different feature blocks as a function of loop position or 
length, the solid black line is the experimental average 
and the red solid line represents the theoretical 
predictions. 

To make the signal dependence on loop length 
clearer, we present the hybridization signals averaged 
over all loop positions as a function of loop length and 
compare them to the predicted signals (upper part of 
Figure 11). The lower part of the same figure shows the 
signal dependence as a function of loop position after 
averaging over all loop lengths. The symbols represent 
the signals of the feature blocks, the solid black line the 
average signal of all feature blocks and the solid red line 
represents the predicted signals. 

Figure 10 and 11 show that the model reproduces our 
experimental findings well. Parameters used here were: 
simulation temperature T sym = 317 K, synthesis error 
rate p = 0.084, energy penalty for synthesis related 
defects Ag de f )Syn = -1 kcal/mol (consideration up to three 
errors per probe during synthesis, Ag de f )Syn was deter- 
mined in [16]). We use the temperature adjusted NN 
and MM defect parameters from [12,13] and the refer- 
ences therein. Since MM defect parameters are only 



a) 
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Figure 10 Comparison between the experimentally observed fluorescent signals and the calculated affinities. Symbols: feature block 1, 
blue upward-pointing triangles; feature block 2, cyan circles; feature block 3, green downward-pointing triangles; feature block 4, magenta 
squares; average of the feature block signals, solid black line; prediction, solid red line, a) The predicted and experimentally obtained 
hybridization signals as a function of loop length for one specific loop position. PM signal is set to 1. b) The dependence of the theoretical and 
experimental hybridization signals on loop position for one specific loop length. PM signal is set to 1. The figures to the right in a) and b) show 
the experimental average (black) with 95% confidence intervals and the predicted signals (red). The experimentally observed hybridization 
signals are reproduced well by our theory. 
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available for isolated MMs, we include another para- 
meter MM de f = -2 kcal/mol for the case of two adjacent 
MMs (we approximate two adjacent MMs as two inde- 
pendent synthesis defects next to each other, therefore: 
MMdefect - 'Z-gdefisyn- Furthermore, we use the (universal) 
parameters for a SAW [23]. The only free parameters 
are the factor C = 1.5 ♦ 10" 3 that links the theoretical 
binding constants K t to fit our experimental data of 
fluorescent signals and the probability for synthesis 
related defects p = 0.084 (the latter is not completely 
free since it is used to check if our theory is consistent 
with the coupling and deprotection efficiency of the 
used oligonucleotides). 

We note that the partition function Z douUe Z i Pper {P, L) 
alone already reproduces the approximate shape of the 
symmetric loop defect profile as shown in Figure lib 
(dependence of the hybridization signal on loop posi- 
tion). However, the resulting binding constants are 
smaller than the ones calculated with Z extended) i e f t and 

Z extended,right respectively. Z e xtended,left &nd Z extended}r igfo t 

help in reproducing the shape and moreover, the abso- 
lute values of the experimental signals (see additional 



file 4: Hybridization signals resulting from Z doub i e zipper 
and comparison to Z extended>right + Z extended} i e ^. 

Small differences between theoretical and experimen- 
tal results regarding the signal dependence on loop posi- 
tion can be explained by the particularities of the duplex 
sequence under study. Here we look at two differences: 

region ranging from loop position 14 to 18: this 
duplex region has many A/T bases and the distance 
between two C bases is the largest for the whole 
sequence. The duplex destabilization of an A/T rich 
region may be underestimated. 

loop position 21: the region has many C bases and the 
loop bases are inserted after two existing C bases. It has 
been shown [22,29,30], that degenerated base pairs may 
reinforce binding considerably. Stabilization by degener- 
ated base pairs is not included in our theory. 

Although there are differences between experiment 
and theory, the deviations are small (see Figure 10 and 
11). An even better agreement could be obtained by 
choosing a different dependence of the duplex initiation 
energy on AL. Our approximation for it (see above) only 
holds for short Aland we suppose the systematic 
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Figure 11 Comparison between experiment and theory after averaging over loop length and loop position respectively Symbols: 
feature block 1, blue upward-pointing triangles; feature block 2, cyan circles; feature block 3, green downward-pointing triangles; feature block 4, 
magenta squares; average of all feature blocks, solid black line; prediction, solid red line, a) After averaging over all loop positions, the calculated 
and experimental hybridization signals as a function of loop length. PM signal is set to 1. b) The dependence of the calculated and experimental 
hybridization signals on loop position after averaging over all loop lengths. PM signal is set to 1. The experimentally observed hybridization 
signals are reproduced well by our theory. 
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deviation visible in Figure 11a from theory and experi- 
ment to originate from our approximation. As expected, 
at longer AL, we tend to underestimate the binding con- 
stant. To our knowledge, although an often encountered 
problem, no simple scheme to assess the initiation 
energy is known. Working out the dependence of the 
initiation energy between the two regimes discussed 
above (short and very long AL) is beyond the scope of 
this paper. Molecular simulations could help to provide 
better understanding of the nucleation process [7]. 

In literature, internal DNA loops or bubbles of total 
length I = l ± + l 2 e.g. occurring in DNA denaturation 
experiments are often treated as SAWs of the same 
length returning to their origin {l x : unbound bases in 
probe; l 2 : unbound bases in target) [24]. Reproduction of 
our experimental data could not be achieved when the 
calculation is done in this way, because the calculated 
loop energy penalties were much too large. Treating a 
DNA loop as a SAW of length I = l x + l 2 returning to the 
origin is different from calculating the probability that 
two SAWs of given lengths li and l 2 start at the same 
point and meet again at some distance. In the first case, 
the number of possible conformations is much higher 
because the constraint is weakened to any pair l[ , 1' 2 with 
l\ +l 2 = I , not just the given l lf l 2 . The first case could 
give the same same results if the calculation is done 
under the constraint that the loop of length / = l x + l 2 
reaches the position f where the two loops reunite after 
li steps similar to the way described in [31]. 

This may not always matter so much: the length of 
the probe sequences used throughout this study is much 
shorter than the length of DNA strands used in DNA 
denaturation experiments. Since the free energy of a 
short DNA strand is small, the size of the loop energy 
penalties is more crucial. 

Conclusions 

In this paper we investigated the stability of DNA with a 
bulged loop. We inserted additional thymine bases into 
the surface-bound PM motif at a given position. By 
hybridizing DNA oligonucleotide targets onto the DNA 
microarray, bulged loops of different length and at dif- 
ferent positions along the DNA duplex are formed. 

We find that duplex stability decreases monotonically 
with the length of the bulged loop. Moreover, if the 
position of the bulged loop on the probe strand is var- 
ied, duplex stability exhibits a symmetric variation with 
respect to the center of the duplex. Duplex stability is 
highest for end- and middle-positions of the inserted 
bulged loop. For theoretical prediction we have shown 
that it is necessary and sufficient to consider strand 
opening at the position of the bulged loop. We have ela- 
borated a successful approximation for the partition 



function of these new binding states. The signal depen- 
dence on loop length and on loop position could be 
reproduced with a limited amount of computing time 
(see Figure 11). 

The employed NN free energy parameters from [12] 
are based on solution hybridization experiments. How- 
ever, as we show in this study and in a previous paper 
[17], these parameters can be used to describe microar- 
ray hybridization well. The corresponding loop energy 
penalties can be obtained by considering the bulged 
loops as a self-avoiding walk on a lattice. 

In our simulation, we use just two free parameters: 

C = 1.5 ♦ 10" 3 : scaling factor, which fits the calculated 
binding constants to the fluorescent light intensities. 
This parameter cannot be avoided. 

p = 0.084: probability of a synthesis related defect. In a 
previous work, the value of p was determined to p = 0, 
1. In our improved experimental setup, we have less 
stray light and a better resolution which result in a bet- 
ter synthesis quality (see Methods). Therefore, we chose 
p to be a free parameter, p is obtained as 0.084 in good 
agreement with the coupling and deprotection efficiency 
of the employed oligonucleotides and the achievable 
contrast of the optical setup [32-34]. Given this knowl- 
edge, p is not completely free and the resulting value is 
used to check the consistency of our theory. 

The formation of bulged loops is an important aspect 
that needs to be considered when analyzing DNA 
microarray data or DNA hybridization of complex mix- 
tures in general. Partly non-complementary sequences 
can form stable complementary duplexes through for- 
mation of a bulged loop resulting in false positive sig- 
nals. The investigation of these bulged loop structures is 
therefore necessary to gain a deeper understanding of 
DNA hybridization and to make DNA microarrays and 
other, nucleic acid based high throughput technology 
based on DNA hybridization more reliable and accurate. 

Additional material 



Additional file 1: Influence of the microarray surface on the 
hybridization signal. We test the influence of the microarray surface on 
the hybridization signal by synthesizing probes with reversed sequence 
(3 '-C ATTAC A AC A ACC ATTA ATACTC ATC ATA ACTT-5 ') . The 5'-end of the 
sequence employed throughout this work corresponds to the 3'-end of 
the reversed sequence. No significant influence of the surface can be 
detected. 

Additional file 2: Duplex stability of DNA duplexes with bulged 
loops of different sequences as a function of loop length. Instead of 
the discussed poly-T loop sequences, we synthesize probes containing 
poly-C loop sequences and random loop sequences respectively at three 
different positions (the number of additional bases vary from one to 
thirteen; the random loop sequences are listed in table b) of this file). 
Upon hybridization with the target sequence listed in table 1, we note a 
monotonic decrease of the fluorescent signal as a function of loop 
length. After averaging over all loop positions, we compare the 
experimental signals as a function of loop length with the model 
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predictions. We show that the experimental data is reproduced by our 
theory. 

Additional file 3: Influence of the number of MMs on the 
fluorescent signal. In order to reproduce our experimental data, it is 
sufficient to consider up to 3 synthesis-related defects in the zipper 
model. We confirm this by measuring the fluorescent signals of probes 
with 1 to 4 MMs. MMs are incorporated into the PM probe motif at 8 
given positions resulting in 162 different probe sequences. To generate 
the MMs, we replace the bases at these specific positions with a thymine 
base (or with an adenine base, if a thymine base is already present at 
the specific position). After categorizing the probes into groups 
according to their number of MMs, we calculate the average signal of 
each group and plot it against the number of MMs (PM signal is set to 1, 
background signal is set to 0). Based on this data, we can estimate the 
error caused by neglecting probes with more than 3 synthesis defects: 
the error « 4%, smaller than the experimental error. 

Additional file 4: Hybridization signals resulting from Z doub | e Z j PP er 
and comparison to Z extended , right + Z extended ,ieft We compare the 
calculated hybridization signals resulting from Z doub i e zjpper to the signals 
resulting from. Z extendedj]ght + Z extendedJeft Jhe predicted hybridization 
signals are similar in shape but differ regarding absolute values. In the 
figure, the scaling factor C, which relates the predictions to the absolute 
signal intensities of the experiments, has been changed to 3 in case of 
Zdoubie zipper, compared to C = 1.5 • 1CT 3 throughout this study. 
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